Panel: Computational Linguistics Research on Philippine Languages
نویسندگان
چکیده
This is a paper that describes computational linguistic activities on Philippines languages. The Philippines is an archipelago with vast numbers of islands and numerous languages. The tasks of understanding, representing and implementing these languages require enormous work. An extensive amount of work has been done on understanding at least some of the major Philippine languages, but little has been done on the computational aspect. Majority of the latter has been on the purpose of machine translation. 1 Philippine Languages Within the 7,200 islands of the Philippine archipelago, there are about one hundred and one (101) languages that are spoken. This is according to the nationwide 1995 census conducted by the National Statistics Office of the Philippine Government (NSO, 1997). The languages that are spoken by at least one percent of the total household population include Tagalog, Cebuano, Ilocano, Hiligaynon, Bikol, Waray, Pampanggo or Kapangpangan, Boholano, Pangasinan or Panggalatok, Maranao, Maguin-danao, and Tausug. Aside from these major languages, there are other Philippine dialects, which are variants of these major languages. Fortunato (1993) classified these dialects into the top nine major languages as above (except for Boholano which is similar to Cebuano). 2 Language Representations Linguistics information on Philippine languages are extensive on the languages mentioned above, except for Maranao, Maguindanao, and Tausug, which are some of the languages spoken in Southern Philippines. But as of yet, extensive research has already been done on theoretical linguistics and little is known for computational linguistics. In fact, the computational linguistics researches on Philippine languages are mainly focused on Tagalog.1 There are also notable work done on Ilocano. Kroeger (1993) showed the importance of the grammatical relations in Tagalog, such as subject and object relations, and the insufficiency of a surface phrase structure paradigm to represent these relations. This issue was further discussed in the LFG98, which is on the problem of voice and grammatical functions in Western Austronesian Languages. Musgrave (1998) introduced the problem certain verbs in these languages that can head more than one transitive clause type. Foley (1998) and Kroeger (1998), in particular, discussed about long debated issues such as nouns in Tagalog that can be verbed, the voice system of Tagalog, and Tagalog as a symmetrical voice system. Latrouite (2000) argued that a level of semantic representation is still necessary to explicitly capture a word’s meaning. Crawford (1999) contributed to an issue on interrogative sentences and suggested that the restriction on wh-movement reveals the syntactic structure of Tagalog. Potet (1995) and Trost (2000) provided general materials on computational morphology, though, both presented examples on Tagalog. Rubino (1997, 1996) provided an in-depth analysis of Ilocano. Among the major contributions of the work include an extensive treatment of the complex morphology in the language, a thorough treatment of the discourse 1 Tagalog (or Pilipino) has the most number of speakers in the country. This may be due to the fact that it was officially declared the national language of the Philippines in 1946. particles, and the reference grammar of the language. 3 Applications in Machine Translation Currently, most of the empirical endeavours in computational linguistics are in machine translation. 3.1 Filipino MT Software There are several commercially available translation software, which include Philippine language, but translation is done word-for-word. One such software is the Universal Translator 2000, which includes Tagalog among 40 other languages. Although omni-directional, translation involving Tagalog excludes morphological and syntactic aspects of the language Another software is the Filipino Language Software, which includes Tagalog, Visayan, Cebuano, and Ilocano languages. 3.2 Machine Translation Research IsaWika! is an English to Filipino machine translator that uses the augmented transition network as its computational architecture (Roxas, 1999). It translates simple and compound declarative statements as well as imperative English statements. To date, it is the most serious research undertaking in machine translation in the Philippines. Borra (1999) presented another translation software that translates simple declarative and imperative statements from English to Filipino. The computational architecture of the system is based on LFG, which differs from IsaWika’s ATN implementation. Part of the research was describing a possible set of semantic information on every grammar category to establish a semantically-close translation.
منابع مشابه
e-Wika: Philippine Connectivity through Language
In this paper, we present what we have attempted towards connecting the Philippine islands through the digitalization of the Philippine languages and their respective applications, and what we intend to do in the future. We present the development of a multi-engine bi-directional English-Filipino Machine Translation (MT) system, and the building of various language resources and tools for this ...
متن کاملPhilippine Language Resources: Trends and Directions
We present the diverse research activities on Philippine languages from all over the country, with focus on the Center for Language Technologies of the College of Computer Studies, De La Salle University, Manila, where majority of the work are conducted. These projects include the formal representation of Philippine languages and the processes involving these languages. Language representation ...
متن کاملHistorical linguistics and Philippine hunter-gatherers
This paper addresses several topics with reference to Philippine hunter-gatherer groups that are relevant to an understanding of their relationships with non-hunter-gatherer groups and their significance for historical linguistics. Section 1 first provides a discussion of the demographic ranges of the extant Negrito groups. In section 2, different views as to the time depth of the prehistoric r...
متن کاملComputational Linguistics (CL) in Pakistan: Issues and Proposals
Internet Communication Technology has opened new venues for CL. Because of this information revolution, research and development is now viable for many languages of Pakistan. This paper briefly presents the current work in CL in Pakistan, issues in its development and some proposals for accelerating the current pace of work in computational modeling of Pakistani Languages.
متن کاملResearch Statement Dissertation Research Visualizing Natural Language Processing
Textual data is at the forefront of information management problems today. Thousands of pages of text data, in many languages, are produced daily: emails, news reports, blog posts, product reviews, discussion forums, academic articles, and business reports. Computational linguistics interventions have also increased, as we rely more and more on automated language translation, summarization, enh...
متن کامل